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ABSTRACT 


By regressing actual green vegetation weights against 
capacitance meter outputs, the linear model more fre- 
quently explained a greater proportion of the variance in 
vegetation weights than did the logarithmic model. Ex- 
amination of the residual plots, however, indicated that 
there may often be a problem of nonconstant variance. 
While the linear model should ordinarily be used for pre- 
dictions of green vegetation weights, sometimes a more 
extensive analysis is necessary to determine the appropri- 
ate model. The R*values and Furnival’s Index indicated a 
better fit for the logarithmic model. Comparisons using R? 
values and Furnival’s Index should be used cautiously. 
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Increases in labor costs are continually increasing the 
need to develop less labor-intensive means of conducting 
analyses of rangeland vegetation. One device that has 
recently gained widespread attention for this purpose is 
the electronic capacitance meter, which allows quick, 
efficient, and nondestructive estimation of forage produc- 
tion (or biomass) based on the direct relationship between 
vegetation weight and capacitance (Fletcher and Robinson 
1956; Neal and Neal 1973). These meters are simple to 
use, allow rapid sampling of an area because only a small, 
separate sample of representative vegetation need be 
clipped and weighed, and are useful under a variety of 
rangeland conditions (Currie and others 1973; Morris and 
others 1976; Neal and others 1976; Platts and Nelson 
1983). 


1The authors are with the Intermountain Research Station. Rodger L. 
Nelson, biological technican, and William S. Platts, research fisheries 
biologist, are at the Forestry Sciences Laboratory in Boise, ID. Charles K. 
Graham, statistician, is at Ogden headquarters. 


Relative Suitabilities 
of Regression 
Models in Electronic 
Analysis of Riparian 
Vegetation 


Use of the electronic capacitance meter depends upon 
the relationship between vegetation weights (green or dry 
weight) and electronic capacitance as measured by the 
meter. Double-sampling techniques (Cochran 1963) have 
been used to establish a relationship between the small 
clipped and weighed secondary samples and the un- 
weighed primary sample. A linear regression model has 
typically been used to describe the relationship between 
vegetation weight and electronic capacitance and has 
generally provided an adequate description (Back and 
others 1969; Neal and Neal 1973; Platts and Nelson 
1983). Recently, some researchers (Terry and others 
1981) have suggested that taking a logarithmic transfor- 
mation of both vegetation weight and electronic capaci- 
tance and fitting a linear regression model to these trans- 
formed variables may result in increased precision. This 
model is nonlinear in the original units and will be re- 
ferred to as the logarithmic model, whereas the model 
using untransformed variables will be referred to as the 
linear model. 

Much of the work supporting use of the logarithmic 
model consists of a comparison of coefficients of determi- 
nation (R?), a questionable procedure whose indiscrimi- 
nate use is discouraged by many writers. Draper and 
Smith (1981), for example, state that adjusted R? values 
(adjusted for different degrees of freedom) may be used to 
compare equations from different sets of data, but only as 
an initial gross indicator. Rodriguez (1982) states that 
caution should be used in judging the explanatory power 
of a nonlinear fit when transformations are involved. 
Additionally, most studies were conducted in limited 
areas on similar vegetation, precluding evaluation of 
their generality. 

We have successfully used electronic capacitance me- 
ters in riparian vegetation since 1979 and have assembled 
an extensive data base from a variety of geographic and 
riparian settings in the Intermountain West and under 


extremely variable climatic conditions. This data base 
was used to examine the relative merits of the two mod- 
els, one using untransformed variables and the other 
using logarithms of both variables. Comparison of these 
regression models will help (1) determine the relative 
precision of each for estimating vegetation weights from 
meter readings, (2) determine which is the more generally 
applicable relationship, and (3) help investigators choose 
the more appropriate model under local conditions. 


STUDY AREAS AND METHODS 


We conducted herbage meter studies in three river 
drainages in south-central Idaho, two in northeastern 
Nevada, and two in northeastern and south-central Utah. 
The study areas in Idaho were in forested meadows of the 
Rocky Mountain Forest Province (Bailey 1980) and were 
characterized by well-developed riparian zones. The 
study areas in Nevada and Utah were in the Great Basin 
on the perimeter of the Intermountain Sagebrush Prov- 
ince (Bailey 1980) and were characterized by narrow, 
poorly developed riparian zones into which xerophytic 
vegetation has frequently invaded. 

Sampling in the Idaho study areas almost exclusively 
included riparian vegetation, chiefly willows (Salix spp.), 
sedges (Carex spp.), and tufted hairgrass (Deschampsia 
intermedia). When nonriparian vegetation reached the 
water’s edge, we included in the samples such species as 
Idaho fescue (Festuca idahoensis) and timber danthonia 
(Danthonia intermedia). Because riparian zones in the 
Great Basin study areas were relatively narrow, such 
terrestrial species as cheatgrass (Bromus tectorum) and 
big sagebrush (Artemisia tridentata) were frequently 
included in the sample with the typical riparian willows, 
sedges, and grasses. No attempt was made to determine 
the actual species composition of the samples. The above 
merely describes the difference in character between the 
geographic locations. 

We measured sample plots using either a Neal Elec- 
tronics? Model 18-2000 or 18-3000 electronic capacitance 
herbage meter. We determined vegetal capacitance of 
each sample plot by taking the average of three successive 
readings on the plot. Vegetation included in the sample 
was selected to provide a wide distribution of capacitance- 
weight points for fitting the regression models. The 
overall sample design was a double-sampling scheme 
(Cochran 1963) and is discussed in detail in Platts and 
others (1987) and Platts and Nelson (1983). Sample size 
varied with the size of the primary sample, with approxi- 
mately one weighed sample plot for every four or five 
biomass sample plots. 

Green vegetation weights (in grams) were regressed 
against meter readings (dimensionless) according to the 
linear model: 


Y=a,+b0X (1) 


\ 


*The use of trade or firm names in this publication is for reader informa- 


tion and does not imply endorsement by the U.S. Department of Agricul- 
ture of any product or service. 


where: 


Y = predicted green vegetation weight 
X = meter reading 
a, = intercept 


b, = slope 
and according to the logarithmically transformed model: 
Ln(Y) =a, + 6,Ln(X) (2) 
where: 


Ln(Y) = natural log of predicted green weight 
Ln(X) = natural log of meter reading 

a, = intercept of the transformed data model 
b, = slope of the transformed data model. 


Zero points, corresponding to calibration of the machine 
to no-yield conditions, were eliminated in both regres- 
sions. This represents a departure from usual methods 
(Platts and Nelson 1983). 

The primary measure used to assess the relative preci- 
sion of each regression model for describing the relation- 
ship between meter readings and vegetation weights was 
the sum of the squared deviations from regression (SSD). 
These were calculated by determining the values from 
each predictive model, subtracting these predicted values 
from the actual values, and squaring the differences. 
Finally these squared differences were added to obtain a 
total for each sampled area. This is a natural measure of 
how well a model can predict, and we feel it is the most 
informative and appropriate. 

Calculating SSD was straightforward for the linear 
model, but the logarithmic model required a retransfor- 
mation back to the original units. A direct retransforma- 
tion by antilogs results in biased estimates, and we used a 
correction formula recommended by Baskerville (1972) to 
correct this bias. (This correction was not large, however, 
and the results from the direct retransformation produced 
similar results.) The smaller the sum of squared devia- 
tions from regression, the better the fit. 

The residuals were also examined to determine whether 
one of the assumptions of linear model fitting was vio- 
lated, namely the assumption of constant variance. This 
was done by plotting the residuals against the predicted 
values. If the variance was not constant, it was expected 
to increase as the green vegetation weight became larger. 
Consequently, the plots of the residuals from the untrans- 
formed data were examined to determine if the absolute 
value of the residuals increased for larger values of pre- 
dicted green vegetation weights. The plots obtained from 
the transformed model were compared with the plots from 
the untransformed model to determine if the residuals 
were more uniform throughout and also to see if the 
absolute values of the residuals in the transformed model 
decreased as the expected values increased. If they did 
decrease, then transformations actually caused a noncon- 
stant variance to occur when it was approximately con- 
stant before transformation. 

Other comparisons of the two models included here are 
(1) coefficients of determination, R?, and (2) Furnival’s 
Indices (Furnival 1961). These are included only because 
they have been widely used by other researchers and it is 


of interest to evaluate the validity of these methods. Coef- 
ficients of determination do not directly measure the fit of 
the regression relationship to the data. Rather, they 
measure the proportion of variation in the response vari- 
able that can be attributed to its regression on the ex- 
planatory variable. Consequently, higher R? values indi- 
cate a better explanation of variation in the response 
variable, but comparison of R? values from models having 
different dependent variables (even when the difference 
results from transformation) is discouraged. 

Furnival’s Index (J) is an attempt to allow comparisons 
of residual errors among regression models when the 
dependent variables differ. It adjusts the standard errors 
to facilitate these comparisons. From the linear model, J 
is identical to the standard error of estimate, but from the 
logarithmic model, J is calculated as: 


Tae xe ) (3) 
where: 
Sy. = the standard error of estimate 
e’ =the antilog of the natural logarithm of the mean 


vegetation weight. 


As with standard errors of the estimate, lower values of I 
indicate a better fit between the model and the observed 
data. 

Neither R? nor Furnival’s Index directly measures how 
well a model predicts. 


RESULTS 


Results of the study indicate a need for careful consid- 
eration of the merits of each model before making a selec- 
tion of their use. 


Sums of Squared Deviations 


An examination of the sums of the squared deviations 
from the regression model showed a clear superiority of 
the linear model over the logarithmic model. The data 
from the locations in Idaho resulted in 40 out of 52 (or 77 
percent) linear regressions having a lower SSD than the 
logarithmic regressions. In the more arid regions of Utah 
and Nevada, the results were even more favorable for the 
linear model. Here 23 out of 27 (or 85 percent) linear 
regressions resulted in lower SSD than using the logarith- 
mic model. Contingency table analyses of these results 


indicate significantly more favorable results (p<0.01) for 
the linear model in both geographic regions. 


Residual Plots 


The results from the examination of the residual plots 
were ambiguous. Admittedly, examining these plots was 
somewhat subjective, and more data points from some 
locations would have helped. Nevertheless, 25 out of 52 
locations in Idaho appeared to produce more uniform 
residual plots. In the Great Basin this number was 12 out 
of 27. Thus, in almost half of the cases, the log transfor- 
mation seemed to help correct for nonconstant variances. 
On the other hand, in over half the cases it appeared that 
a correction was unnecessary. This ratio was about the 
same in Idaho as in the more arid locations in the Great 
Basin. 


Coefficients of Determination 


When the R? values were examined, there were only 12 
out of 52 (or 24 percent) cases in Idaho where the linear 
model had higher R? values than the logarithmic model. 
This is opposite of what was found by examining the sums 
of squared deviations. In the Great Basin, however, 15 
out of 27 (or 50 percent) showed higher R? values for the 
linear models. (Mean values are presented in table 1.) 
However, the differences in the means of the coefficient 
values were not found to be significant. The logarithmic 
model appeared to provide the better fit in Idaho based on 
this criterion, whereas the linear model seemed more 
suitable in the Great Basin sites. There also appeared to 
be a difference in fit that was related to the year of 
sampling. 

The proportion of logarithmic R? values exceeding the 
linear R? values was tested to determine if this value 
differed from 0.5. The proportion of cases in Idaho was 
significantly greater than 0.5 (p<0.01), but the proportion 
of cases in the Great Basin was not significantly different 
from 0.5 (p<0.05). It should be recalled that comparison 
of R? values for different dependent variables is discour- 
aged and that these were made for comparison with the 
work of other researchers. Nevertheless, the results of 
the R? values were opposite of what was expected after ex- 
amining the SSD for both locations, and particularly for 
locations in Idaho. 


Table 1— Mean linear and logarithmic model coefficients of determination (R?) by geographic region and year of sampling 


Mean R? 


1982 1983 Total 


Study 1979 1980 1981 

areas Lin Log Lin Log Lin 
Idaho 0.70 0.78 0.83 0.84 0.74 
Great Basin 90 69 83 83 81 
Combined 76 aS) 83 84 te 


Lin Log Lin Log Lin Log 


0.86 0.89 0.74 0.86 0.78 0.84 
85 87 78 .80 83 .80 
86 .88 76 84 sh) 83 


Table 2—Mean linear and logarithmic model values of Furnival's Index (/) by geographic region and year of sampling 


Mean / 
Study 1979 1980 1981 1982 1983 Total 
areas Lin Log Lin Log Lin Lin Log Lin Log Lin Log 
Idaho 33.9 20.0 20.0 13.5 27.7 B3'2F 2518 25.4 17.4 27.9 19.6 
Great Basin 15.6 19.2 43.3 34.0 Uf 29.2 16.6 28.2 19.4 24.2 28.3 
Combined 28.7 19.8 26.2 19.0 20.4 316 22:0 26.4 18.1 26.6 19.2 


Furnival’s Index 


Furnival’s Index (1961) has also been used by other 
researchers to compare models (Terry and others 1981). 
The purpose of this index (J) is to adjust the standard 
errors of regression models with different dependent vari- 
ables to allow comparison among them. The model pro- 
ducing a lower value of J would be interpreted as the 
model that best fits the data. 

The results from this index were more dramatically in 
favor of the logarithmic model than were comparisons of 
R?. In Idaho 42 out of 52 (or 81 percent) produced higher 
indices for the logarithmic model, and in the Great Basin 
only 18 out of 27 (or 67 percent). Again, these results 
were unexpected after examining the SSD values. 

Also of interest is the fact that Furnival’s Index indi- 
cated that the transformed model was better 13 times in 
79 cases. However, we found no evidence from the SSD’s, 
residual plots, or the scatter diagrams of the original data 
to support this result. 

Table 2 shows average values of J for both regression 
models by year and geographic location. Overall values of 
I were significantly lower (p<0.001) when data were fitted 
to the logarithmic model. In only two cases did the linear 
model produce lower J values: Great Basin in 1975 and 
1981. It is interesting that these two exceptional inci- 
dents occurred in the more arid Great Basin study areas, 
and during two unusually dry years. 


DISCUSSION 


For most of the conditions under which we have used 
the electronic capacitance meter with double-sampling, 
the linear regression model provided a more accurate 
means of predicting green weights based upon the SSD. 
Each squared deviation measures how far the predicted 
value deviates from the observed value, and their sum 
provides an overall measure of how far the model predic- 
tions deviate from observed values. The sum of squared 
deviations from the regression model is a clear, natural, 
and powerful measure of how well the model predicts and 
was considered as the overriding criterion for determining 
which regression model provided better predictions. 

However, the assumption of a uniform variance is also 
extremely important in model fitting. Examination of the 
residual plots indicated that this was a problem in about 
half the cases that we examined. A logarithmic transfor- 
mation of the data can sometimes be used to help correct 
this problem. Our results are thus somewhat ambiguous, 
and it may often be difficult to determine which model 


should be used. Under ordinary circumstances, when a 
quick prediction of vegetation weight is wanted, we see 
little advantage in using the logarithmic model because it 
is more complicated conceptually and more difficult to 
compute. In addition, when transformations are used, 
direct retransformations result in biased predictions re- 
quiring the use of correction factors. 

In some situations an extensive analysis to determine 
the appropriate model may be justified. Under these 
circumstances we recommend a careful examination of 
the data using scatter plots of both the data and the re- 
siduals as well as examination of the SSD. It may be 
important to determine what transformation corrects for 
a nonconstant variance, and after transformation if a 
linear model fits the transformed data, or if a nonlinear 
model is indicated. This procedure can become rather 
complicated and will not be discussed at length here. 

Interestingly, the linear model worked better in arid 
regions than in wetter areas and in drier years. On the 
other hand, the logarithmic model produced its best re- 
sults in wetter regions and in wetter years. Thus, 
weather conditions may influence the shape of the curve 
and influence the choice of the more appropriate model. 

Investigators should always retain the responsibility of 
comparing the fit of both models to their data and select- 
ing the more appropriate one. 

Comparison of coefficients of determination (R?) derived 
from each model is not a recommended procedure because 
the dependent variables are not identical. This method 
has often been used, but our results verify that it is noth- 
ing more than a rough indicator. Even so, we expected 
the R? values to produce the same general results as the 
sums of the squared deviations. When the magnitudes of 
the R? values were examined, however, they showed no 
clear-cut superiority for the logarithmic model, although 
the logarithmic model produced higher values of R? more 
frequently. Thus, based on the comparison of R? values, 
we would have little reason to reject use of the linear 
model for most routine analyses. 

An examination of the values obtained from Furnival’s 
Index implied a clear superiority of the logarithmic model 
over the linear model. This was a result of the magni- 
tudes of the J values as well as the fact the J values were 
generally lower for the logarithmic model. Again we ex- 
pected Furnival’s Index to produce the same results as the 
SSD, and our results were surprising. These results—and 
the 13 cases where Furnival’s Index indicated a transfor- 
mation and other analyses indicated a transformation 
was unnecessary—suggest a careful reexamination of 
routine use of this index. 


The question of the significance of sampling riparian 
vegetation needs further evaluation. Vegetation moisture 
may likely have an effect on the relative suitability of the 
two models because the logarithmic model tended to be 
more effective in the wetter study areas of Idaho. Cer- 
tainly, this would indicate that plant phenology may have 
a bearing on model selection, as may the range type (for 
example, riparian or upland range). We are currently 
considering studies that will help answer these questions. 
Meanwhile, the choice of which model to use requires 
careful thought and analysis. 
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